Chapter 6

This is a poorly organized collection of my solutions to the Chapter 6 exercises in Doing Bayesian Data Analysis. It's been a while since I blogged so I thought I would put this up. Maybe it will at least serve to give others another perspective on these problems.


In [1]:
%load_ext rpy2.ipython

In [119]:
%%R
setwd("/Users/justin/Dropbox/Articles/databozo/data/ProgramsDoingBayesianDataAnalysis/")
source('BernBeta.R')
source('BernGrid.R')

In [13]:
%%R
BernBeta(c(4,4), c(1))


Error in dev.off() : 
  QuartzBitmap_Output - unable to open file '/tmp/tmpKAfz_Z/Rplots001.png'
[1] 5 4

In [150]:
%%R
exercise_6_8_1_a <- function(){
    nIntervals = 10
    width=1/nIntervals
    Theta=seq(from=width/2, to=1-width/2, by=width)
    approxMass = dbeta(Theta, 8, 4)*width
    pTheta = approxMass/sum(approxMass)
    sum(approxMass)
}
exercise_6_8_1_a()


[1] 0.9991421

The sum isn't exactly one simply because we have a discrete and thus approximate answer. The more buckets we add though the close we will get to one. For example:


In [152]:
%%R
exercise_6_8_1_a_extra <- function(){
    nIntervals = 1000
    width=1/nIntervals
    Theta=seq(from=width/2, to=1-width/2, by=width)
    approxMass = dbeta(Theta, 8, 4)*width
    pTheta = approxMass/sum(approxMass)
    sum(approxMass)
}
exercise_6_8_1_a_extra()


[1] 1

Exercise 6.8.1

The range given doesn't make sense since the bins of the distribution will go from 0+width to 1+width which means we're counting >100% being possible and it isn't.


In [35]:
%%R
plot(approxMass,type='bar')



In [36]:
%%R
Theta=seq(from=0, to=1, by=width)
approxMass = dbeta(Theta, 8, 4)*width
pTheta = approxMass/sum(approxMass)

In [38]:
%%R
sum(approxMass)


[1] 1.000992

In [37]:
%%R
plot(approxMass,type='bar')


Exercise 6.8.2 A

Assuming I think the coin might be equally likely to be weighted towards heads/tails or might be fair.


In [66]:
%%R
pTheta = c(50:1, 1:50, 50:1, 1:50)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)

In [67]:
%%R
plot(pTheta)


Exercise 6.8.2 B

Given data where I flipped the coin 20 times and got 15 heads.


In [68]:
%%R
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 15),rep(0,5)))


Exercise 6.8.3 A

What would that posterior look like if I only flipped 4 times and got 3 heads?


In [69]:
%%R
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 3),rep(0,1)))


Exercise 6.8.3 B

Then I flipped it 16 more times with 12 heads.


In [70]:
%%R
pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 12),rep(0,4)))


Then just for fun let's see what happens if I give a ton more data.


In [71]:
%%R
pThetaGivenDataNextNext = BernGrid(Theta, pThetaGivenDataNext, c(rep(1, 121),rep(0,40)))


Exercise 6.8.4 A

Which candidate is most preferred? Can we say? Assume that in a poll of 100 people, 58 preferred candidate A. What is the HDI?


In [81]:
%%R
exercise_6_4_a <- function()
    pTheta = rep(1,101)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 58),rep(0,42))) 

exercise_6_4_a()



In [83]:
%%R
exercise_6_4_a_2 <- function()
    pTheta = rep(1,101)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 42),rep(0,58))) 

exercise_6_4_a_2()


Candidate A is preferred by 48% to 66.8% of people

Candidate B is preferred by 33.2% to 52% of people

Exercise 6.8.4 B

We can't actually say either candidate is more preferred than the other however. This is due to 50% being in both HDIs.

Exercise 6.8.4 C

Assuming we re-run the poll and add more data (57 more people prefer Candidate A over B out of 100). Now what is the HDI?


In [129]:
%%R
exercise_6_4_c <- function(){
    pTheta = rep(1,1001)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 58),rep(0,42))) 
    pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 57),rep(0,43))) 
}
exercise_6_4_c()


[1] 484
[1] 484 674
[1] 507
[1] 507 643

Now our HDI is 51% to 63.9% for Candidate A.

Exercise 6.8.4 D

With this updated data it is technically credible that Candidate A is more preferred than Candidate B. I say technically because in this case 50% is just barely outside our interval.

Exercise 6.5

Is it credible to believe that the defect rate is less than 10%?


In [128]:
%%R
exercise_6_5 = function(){
    pTheta = rep(1,1001)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 28),rep(0, 472)))
    return(pThetaGivenData)
}
given = exercise_6_5()


[1] 39
[1] 39 79

I used a non-informative prior since we have no prior information. Even doing this, which should be the most conservative estimate, it looks like the HDI is 5-8%. Since 10% is outside of the credibility interval it is not credible to believe that the process is outside of control limits.

Exercise 6.8.6


In [126]:
%%R
exercise_6_6 <- function(){
    pTheta = seq(from=0, to=1001, by=1)^2
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 2),rep(0, 2)))
}
exercise_6_6()


[1] 317
[1] 317 922

Exercise 6.7


In [130]:
%%R
exercise_6_7_1 <- function(){
    pTheta = seq(from=0, to=1001, by=1)^2
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 6),rep(0, 2)))
}
exercise_6_7_2 <- function(){
    pTheta = (1001-seq(from=0, to=1001, by=1))^2
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 6),rep(0, 2)))
}
exercise_6_7_1()
exercise_6_7_2()


[1] 518
[1] 518 962
[1] 319
[1] 319 843

The ratio of posterior beliefs is 4.66 (optimistic model to pessimistic).

Exercise 6.8.8 A


In [154]:
%%R
exercise_6_8_a <- function(){
    pTheta = c(rep(1,51),49:1)# seq(from=0, to=1001, by=1)
    pTheta = pTheta / sum(pTheta)
    plot(pTheta)
}
exercise_6_8_a()


I created this prior as a way of showing that I am very optimistic in the drug giving us SOME effect... the amount of effect declines until it's minimal at 100%. This is also a fairly unusual prior and I'm curious how the rest of the exercise turns out as a result. If I were to make this better, I'd probably give it a peak in the middle so it bulges between 50 and 100% instead of just declining.


In [155]:
%%R
exercise_6_8_b = function(){
    pTheta = c(rep(1,501),500:1)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 30),rep(0, 20)))
    return(pThetaGivenData)
}
given = exercise_6_8_b()


[1] 502
[1] 502 698

My reasoning for this prior is that we have a very limited belief that the drug will decrease the birth of boys. Really only the magnitude of the effect of more boys is what we doubt. As an example: this is what would happen if the experiment results were reversed.


In [156]:
%%R
exercise_6_8_b_2 = function(){
    pTheta = c(rep(1,501),500:1)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 20),rep(0, 30)))
    return(pThetaGivenData)
}
given = exercise_6_8_b_2()


[1] 502
[1] 502 597

At first when I saw this, I wondered if anything could shift the likelihood to the left. Suppose the experiment had 5 times the data. Then we see this:


In [157]:
%%R
exercise_6_8_b_3 = function(){
    pTheta = c(rep(1,501),500:1)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 100),rep(0, 150)))
    return(pThetaGivenData)
}
given = exercise_6_8_b_3()


[1] 343
[1] 343 461
[1] 343 461 502
[1] 343 461 502 527

Finally the data overwhelms the prior and causes an almost split personality. Our HDI is actually two intervals: 34.3%-46.1% and 50.2%-52.7%. It is 95% likely that true effect lies in one of these two intervals. The single most likely value however is 50.2%. Just slightly better than random.

Exercise 6.8.8 C


In [149]:
%%R
exercise_6_8_c_company = function(){
    print("Company subjective beliefs")
    pTheta = rep(50, 1001)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 60),rep(0, 40)))
    pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 30),rep(0, 20)))
    return(pThetaGivenDataNext)
}
exercise_6_8_c_skeptic = function(){
    print("Skeptic subjective beliefs")
    pTheta = rep(50, 1001)
    pTheta = pTheta / sum(pTheta)
    width = 1/length(pTheta)
    Theta = seq(from=width/2, to=1-width/2, by=width)
    pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 50),rep(0, 50)))
    pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 30),rep(0, 20)))
    return(pThetaGivenDataNext)
}
given = exercise_6_8_c_company()
given_2 = exercise_6_8_c_skeptic()


[1] "Company subjective beliefs"
[1] 504
[1] 504 693
[1] 522
[1] 522 677
[1] "Skeptic subjective beliefs"
[1] 405
[1] 405 597
[1] 455
[1] 455 613

The data wouldn't be enough to move the skeptic. The company would have enough data to believe that their drug works however. It's an interesting mathematical example of the effects of bias in data analysis.

Homework complete!

That's all of the exercises for Chapter 6. I'm done for the week. The next chapter I'm working on is chapter 7 which gives me the first taste of MCMC in the book. Very excited!


In [ ]: